Scaling Properties of Common Statistical Operators for Gridded Datasets
نویسندگان
چکیده
An accurate cost-model that accounts for dataset size and structure can help optimize geoscience data analysis. We develop and apply a computational model to estimate data analysis costs for arithmetic operations on gridded datasets typical of satelliteor climate model-origin. For these dataset geometries our model predicts data reduction scalings that agree with measurements of widely-used geoscience data processing software, the netCDF Operators (NCO). I/O performance and library design dominate throughput for simple analysis (e.g., dataset differencing). Dataset structure can reduce analysis throughput ten-fold relative to same-sized unstructured datasets. We demonstrate algorithmic optimizations which substantially increase throughput for more complex, arithmetic-dominated analysis such as weighted-averaging of multi-dimensional data. These scaling properties can help to estimate costs of distribution strategies for data reduction in cluster and grid environments.
منابع مشابه
Accuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملAnalysis of self-describing gridded geoscience data with netCDF Operators (NCO)
The netCDF Operator (NCO) software facilitates manipulation and analysis of gridded 2 geoscience data stored in the self-describing netCDF format. NCO is optimized to efficiently 3 analyze large multi-dimensional datasets spanning many files. Researchers and data centers 4 often use NCO to analyze and serve observed and modeled geoscience data including satel5 lite observations and weather, air...
متن کاملStatistical Characteristics of Daily Precipitation: Comparisons of Gridded and Point Datasets
Gridding of daily precipitation data alleviates many of the limitations of data that are derived from point observations, such as problems associated with missing data and the lack of spatial coverage. As a result, gridded precipitation data can be valuable for applied climatological research and monitoring, but they too have limitations. To understand the limitations of gridded data more fully...
متن کاملGridded global datasets for Gross Domestic Product and Human Development Index over 1990–2015
An increasing amount of high-resolution global spatial data are available, and used for various assessments. However, key economic and human development indicators are still mainly provided only at national level, and downscaled by users for gridded spatial analyses. Instead, it would be beneficial to adopt data for sub-national administrative units where available, supplemented by national dat...
متن کاملOn the approximation by Chlodowsky type generalization of (p,q)-Bernstein operators
In the present article, we introduce Chlodowsky variant of $(p,q)$-Bernstein operators and compute the moments for these operators which are used in proving our main results. Further, we study some approximation properties of these new operators, which include the rate of convergence using usual modulus of continuity and also the rate of convergence when the function $f$ belongs to the class Li...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJHPCA
دوره 21 شماره
صفحات -
تاریخ انتشار 2007